prolfquapp - Streamlining Protein Differential Expression Analysis in Core Facilities

Witold Wolski1; 2, , Jonas Grossmann1; 2, , Paolo Nanni1, , Bernd Roschitzki1, , Claudia Fortes1, Christian Panse1; 2, , Ralph Schlapbach1,

1 Functional Genomics Center Zurich - ETH Zurich/University of Zurich (https://www.fgcz.ch/); 2 Swiss Institute of Bioinformatics (https://www.sib.swiss/)

Introduction

  • Protein differential expression analysis (DEA) for DIANN, FragPipe DDA, FragPipe TMT, MaxQuant outputs, or MSstats inputs.
  • Uses preprocessing and statistical models implemented in the R package prolfqua
    doi.org/10.1021/acs.jproteome.2c00441
  • Generates dynamic HTML reports
  • Exports results as XLSX files, .rnk and .txt files for GSEA and ORA
  • Archived analysis can easily be replicate on any system running R (>= 4.1)

How To

Install R and prolfquapp

install.packages('remotes')
remotes::install_github('wolski/prolfquapp', dependencies = TRUE)

Create a directory with :

  • config.yaml (parameter file)
  • dataset.csv (experimental design)
  • the FASTA file
  • DIANN, FragPipe or MaxQuant results

Copy the R code into the working directory by running one of the functions:

The content of the working directory is:

Finally, from R console source("FP_DIA.R"), or execute Rscript FP_DIA.R. This creates a subfolder with the DEA results.

  • DE_Groups_vs_Controls.html report describing the main steps of the analysis and shows the results.
  • DE_Groups_vs_Controls.xlsx contains the raw and transformed abundances, annotations, results of the differential expression analysis.
  • .rnk, and .txt files for GSEA and ORA analysis
  • Diagnostic plots for each proteins (boxplots, lineplots for peptide abundances)

The entire working directory including input data, R code and results is archived. You can unzip it later and replicate the analysis using your R installation.

Analysis parameters

The config.yaml file specifies the parameters of the analysis:

  • project related information e.g. projectID, is shown in the HTML report
  • aggregation method
    (medpolish, rlm, top_3)
  • abundance transformation
    (robscale, vsn, none),
  • FDR and effect size thresholds

Sample annotation

The dataset.csv file contains the information about the measured samples:

  • Relative.Path/Path/raw.file/channel/ (unique)
  • name - used in plots and figures (unique)
  • group/experiment - main factor
  • subject/bioreplicate (optional) - blocking factor
  • control - used to specify the control condition (C) (optional)

The column names are not case sensitive.

If subject is specified then the model is abundance ~ group + subject, otherwise abundance ~ group. The group differences to compute are determined from the group and control columns. MSstats anntotation.csv and dataset.csv are similar.

HTML Report

  • Project related information (project ID etc)
  • Primary introduction to DEA
  • Sums up the design of the experiment
  • Summarizes of protein ident. and quant.:
    missigness, CV, clustering, PCA
  • DEA results with volcano plots and tables (they interact using crosslink)
  • Explains output formats, gives pointers to follow up analysis (GSEA, ORA)

Summary

  • Integrates into LIMS system
    doi.org/10.1515/jib-2022-0031
  • Archived working directory contains the results and all the data needed to replicate analysis on your PC
  • User-friendly data formats (XLSX, txt, rnk)